Search results for "Sequence classification"
showing 5 items of 5 documents
Classification of Sequences with Deep Artificial Neural Networks: Representation and Architectural Issues
2021
DNA sequences are the basic data type that is processed to perform a generic study of biological data analysis. One key component of the biological analysis is represented by sequence classification, a methodology that is widely used to analyze sequential data of different nature. However, its application to DNA sequences requires a proper representation of such sequences, which is still an open research problem. Machine Learning (ML) methodologies have given a fundamental contribution to the solution of the problem. Among them, recently, also Deep Neural Network (DNN) models have shown strongly encouraging results. In this chapter, we deal with specific classification problems related to t…
A Quantum-Inspired Classifier for Early Web Bot Detection
2022
This paper introduces a novel approach, inspired by the principles of Quantum Computing, to address web bot detection in terms of real-time classification of an incoming data stream of HTTP request headers, in order to ensure the shortest decision time with the highest accuracy. The proposed approach exploits the analogy between the intrinsic correlation of two or more particles and the dependence of each HTTP request on the preceding ones. Starting from the a-posteriori probability of each request to belong to a particular class, it is possible to assign a Qubit state representing a combination of the aforementioned probabilities for all available observations of the time series. By levera…
A new feature selection strategy for K-mers sequence representation
2014
DNA sequence decomposition into k-mers (substrings of length k) and their frequency counting, defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length. This simple process allows to compute sequence comparison in an alignment free way, using common similarities and distance functions on the numerical codomain of the mapping. The most common used decomposition uses all the substrings of length k making the codomain of exponential dimension. This obviously can affect the time complexity of the similarity computation, and in general of the machine learning algorithm used for the purpose of sequence classification. Moreover, the presence of possible n…
Alignment free Dissimilarities for sequence classification
2015
One way to represent a DNA sequence is to break it down into substrings of length L, called L-tuples, and count the occurence of each L-tuple in the sequence. This representation defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length, that allows to measure sequence similarity in an alignment free way simply using disssimilarity functions between vectors. This work presents a benchmark study of 4 alignment free disssimilarity functions between sequences, computed on their L-tuples representation, for the purpose of sequence classification. In our experiments, we have tested the classes of geometric-based, correlation-based and information-based …
A New Feature Selection Methodology for K-mers Representation of DNA Sequences
2015
DNA sequence decomposition into k-mers and their frequency counting, defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length. This simple process allows to compare sequences in an alignment free way, using common similarities and distance functions on the numerical codomain of the mapping. The most common used decomposition uses all the substrings of a fixed length k making the codomain of exponential dimension. This obviously can affect the time complexity of the similarity computation, and in general of the machine learning algorithm used for the purpose of sequence analysis. Moreover, the presence of possible noisy features can also affect the…